## **RESEARCH ARTICLE**

OPEN ACCESS

# Hardware Implementation of Two's Compliment Multiplier with Partial Product by Passing Technique

Abida Yousuf\*, Najeeb-ud-din\*\*, Ruhi Naaz Mir\*\*\*

\*(Department of ECE, National Institute of Technology, Srinagar)

\*\* (Department of ECE, National Institute of Technology, Srinagar)

\*\*\*(Department of Computer science, National Institute of Technology, Srinagar)

## ABSTRACT

With the emergence of portable computing and communication systems, power consumption has become one of the major objectives during VLSI design. Furthermore, the multiplication is an essential arithmetic operation for common DSP applications, such as filtering, convolution, fast Fourier Transform (FFT) etc. To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. This paper presents an approach to reduce power consumption of 2's compliment multiplier design, in which switching activities are reduced through dynamic by passing of partial products.

#### I. Introduction

The computation of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which in the CMOS circuit design, requires many switching activities. Thus, switching activities within the functional units of a multiplier account for the majority of the power dissipation of a multiplier as given

$$P_{avg} = \alpha \times C_L \times V_{dd}^2 \times f_{clk}$$

Where  $\alpha$  is the switching activity parameter, C<sub>L</sub> is the loading capacitance, Vdd is the operating voltage, and f<sub>clk</sub> is the operating frequency. Therefore minimizing switching activities can effectively reduce power dissipation without impacting the circuit's operational performance. Besides the advancement in technology makes it possible to put more and more devices in the same silicon area while at the same time pushes the clock rate even higher. Low power design is thus necessary to reduce the packaging and cooling costs as well as prolong the life span of integrated circuits (ICs)[1,2]. The technology-independent low-power design strategy reduces power consumption through a refined design process. An obvious method to reduce power consumption is to shut down part of a circuit when it is not in operation condition. Since the dynamic power dissipation in a VLSI is usually introduced by signal transitions in the circuit, in our design we minimize the average power dissipation by reducing switching activities of a given logic circuit [3,4].

#### II. Related Research

In FPGA design, Power reduction is possible only through reduced switching activity, which is also called dynamic power. In general dynamic power consumption is defined as the power consumed while the clock is running and the external inputs are switching. In general design practices to reduce switching activity reduction can be controlled at various levels of the design flow. Architectural decisions in the early design phases have the greatest impact [6].

For high switching signals, delay balancing and reduction of logic levels are among the most efficient techniques to tackle power penalty. An obvious method to reduce the switching activity is to shut down the idle part of the circuit, which is not in operating condition. A general (M x N) parallel multiplier operates by computing the partial products in parallel and by shifting and accumulating the partial products reducing the switching activity of the component used in the design can minimize the power dissipation i.e. if k<sub>th</sub> bit of the coefficient is zero, the k<sub>th</sub> row/column of adders need not be activated. However, this type of multiplier does not help us for reduced switching since there is unnecessarily switching of adders even if the  $k_{th}$  bit is zero[7,8].

# III. Two's complement Parallel Array Multiplier

In 2's compliment multiplication, Partial Products are adjusted such that negative sign move to last step, which in turn maximize the regularity of the multiplication array. Consider a signed n-bit multiplicand A is multiplied by a signed n-bit multiplier B to produce a signed 2n bit product P [5]. The two's compliment representations for A and  $BA = -a_{n-1}2^{n-1} + \sum_{i=0}^{n-2} a_i 2^i$  (1)

$$B = -b_{n-1}2^{n-1} + \sum_{i=0}^{n-2} b_j \ 2^j \tag{2}$$

Where *a*'s and *b*'s are the bits in A and B, respectively, and  $a_{n-1}$  and  $b_{n-1}$  are the sign bits. The difficulty of using two's complement multiplication is in processing the sign bits of multiplicand and multiplier. An efficient method to resolve this problem is provided by Baugh Wooley algorithm. The principle for an n x n two's complement multiplication is indicated below [5]:

$$P = A \times B = a_{n-1}b_{n-1}2^{2n-2} \times \sum_{i=0}^{n-2} \sum_{j=0}^{n-2} a_i \times b_j \times 2^{i+j} + 2^{n-1} \sum_{i=0}^{n-2} \overline{b_{n-1}a_i} 2^i - 2^{2n-1} \sum_{j=0}^{n-2} \overline{a_{n-1}b_j} 2^j - 2^{2n-1} + 2^n$$

(3)

The implementation of Baugh Wooley algorithm is shown in figure 2. This structure is used to handle the 2's complement multiplication, with some of the partial products replaced by their complements. The multiplier array consists of (n-1) rows of carry-save adders (CSA), in which each rows contains (n-1) full adders (FA). The last row is a ripple adder for carry propagation. In this paper, we shall propose a low power design for this multiplier.



(b) Figure 1: (a) Parallel array 2's complement Multiplier Architecture (b) Multiplier Cells

#### Proposed 2's Complement Multiplier With partial product Bypassing Technique

In this technique, the main idea is based on the observation that most modern multipliers produce a large number of signal transitions while adding zero partial products. If any bit of the multiplier is zero that row or column of adders need not to be activated, since corresponding partial product is zero. The adders of these multipliers, however perform summation of the zero partial products and as result exhibit redundant signal switching. The increased activity of the internal nodes results in unnecessary power dissipation. To disable adder we have to pass the partial product of previous adder to next adder. It eliminates the unnecessary transitions and bypass inputs to outputs when corresponding partial product is zero. Multiplexers are used at the output of full adder to pass the partial product directly when it is zero to the next stage. Consider the multiplication shown in Figure 2, which executes 1010 x 1111. Note that, in the first and third diagonals (enclosed by dashed lines), two out of the three input bits are 0: the "carry" bit from its upper right FA, and the partial product *aibj* (note that  $a_0 = a_2 = 0$ ). As a result, the output carry bit of the FA is 0, and the output sum bit is simply equal to the third bit, which is the "sum" output of its upper FA [9, 10].



Fig. 2.Partial Product bypassing Example

For each FA, the output sum bit goes down, while Thus, when  $a_j = 0$ , the operations in column *j* can be ignored and thus the full adders can be disabled since the outputs are known.

## V. Modified Multiplier Adder Unit (AU) Design

In this technique, higher power reduction can be achieved if the operand contains more number of zero's [11]. The switching activity of the component used in the design depand on the input bit coefficient. This means if the input bit coefficient is zero, corresponding row or column of adders need not be activated, if operand contains more zeros, higher power reduction can be achieved. The partial product bypassing multiplier is constructed as follows. First, the modified FA cell is shown in Figure 3. The tristate buffers, placed at the inputs of the adder cell, disable signal transitions in those adding cells which are bypassed. The output carry-bits c is passed downwards, instead of to the right.

Therefore, when  $a_j = 0$ , the two input of AU are disabled, and thus its output carry bit will not be changed. Therefore, all three inputs of FA are fixed, which prohibit its output from changing.



Fig.3. Modified Multiplier Adder Unit

Figure 4 shows the 4 x 4 array structure of the proposed column-bypassing multiplier. In the bottom of the array, we need to set the carry outputs to be 0. Otherwise, the corresponding FA's may not produce the correct outputs since their inputs are disabled. This is done by adding an AND gate at the outputs of the last-row CSA adders.



Fig. 4. A 4×4 multiplier structure

### VI. Results

The hardware design and implementation of FPGA based 2's complement algorithm is implemented on Xilinx vertex-5 using ISE version 12.4 tool. The figure shows the comparison between 2's complement multiplication with and without partial product bypassing multiplication technique in terms of delay.



Fig.5. comparison in terms of delay

The Table I shows the comparison in terms of power consumption. The Xilinx's XPower Estimator (XPE) tool is used in order to calculate power consumed in an arithmetic circuit. These results are obtained for the operands having probability of 0.5 for both 0 and 1. However if the operands have more number of 0's than 1's, then there will be greater power saving or in other words if distribution of 0's and 1's are not uniform, then there will be greater power saving.

| Table 1: | Compar   | rison in-t | terms of Po | wer c | ons | umption |
|----------|----------|------------|-------------|-------|-----|---------|
| between  | Baugh    | wooley     | Multiplier  | with  | or  | without |
| bypassin | g techni | que.       |             |       |     |         |

| Multiplier                                     | 4bit        | 8bit         | 16bit        |
|------------------------------------------------|-------------|--------------|--------------|
| Baugh<br>Wooley                                | 87.45m<br>w | 134.42m<br>w | 270.92<br>mw |
| Baugh<br>Wooley With<br>bypassing<br>technique | 83.43mw     | 125.23mw     | 246.92mw     |

#### Conclusion

In this paper, we presented hardware design and implementation of FPGA based partial product bypassing technique of 2's compliment Baugh Wooley multiplier utilizing VHDL. The design was implemented on Xilinx vertex 5 FPGA device using the ISE 12.4 design tool. This bypassing method saves power whenever there is a zero in the input operands of the multiplier.

#### **REFERENCES**:

- [1] R.M.Badghare, S.K.Mangal, R.B.Deshmukh, R.M.Patrikar, "Design of Low Power ParallelMultiplier,"Journal of Low Power Electronics, vol 5, No. 1, pp 31-39, April 2009.
- [2] Muhammad H. Rais, Bandar M. Al-Harthi, Saad I. Al-Askar and Fahad K. Al-Hussein, "Design and Field Programmable Gate Array Implementation of Basic Building Blocks for Power-Efficient Baugh-Wooley Multipliers," American Journal of Engineering and Applied Sciences, pp 307-311, 2010.
- [3] A.Hesham, "Technology scaling effects on multipliers," *IEEE Transactions on Computers*, vol.47, No.11, pp. 1201-1215, Nov 1998.
- [4] F. Najm, "Transition density, a stochastic measure of activity in digital circuits," *Proceedings of 28th Design Automation Conference*, pp. 644-649, June 1991.
- [5] C. R. Baugh and B. A. Wooley, "A two's complement parallel array multiplication algorithm," *IEEE Trans. Compt.*, vol. C-22, no. 12, pp.1045-1047, Dec. 1973.
- [6] M.Y.Kong, Langloi, Al.Khalili, "Efficient FPGA Implementation of complex using the logarithmic number system," *IEEE International Symposium* on Circuits and systems, pp 3154 – 3157, 2008.
- [7] S. Mahant-Shetti, P. Balsara, and C. Lemonds, "High Performance Low Power Array Multiplier Using Temporal Tiling," *IEEE Transaction on VLSI Systems*. pp. 121-124, Mar. 1999.
- [8] Ronak Bajaj, Saransh Chhabra, Sreehari Veeramachaneni, M.B Srinivas "A novel architecture for low-power design of parallel multipliers," 9th International Symposium on Communication and Information Technology , 2009.
- [9] J. Ohban, V.G. Moshnyaga, and K. Inoue, "Multiplier energy reduction through bypassing of partial products," *Asia-Pacific Conference. on Circuits and Systems*, vol.2, pp. 13-17. 2002.

- [10] Ming-Chen When, Sying –Jyan Wang, and Yen-Nan Lin, "Low Power Parallel Multiplier with Column Bypassing," *IEEE Transactions* pp. 1638-1641, 2005.
- [11] A.Wu, "High performance adder cell for low power pipelined multiplier," Proceedings of *IEEE International Symposium. on Circuits and Systems*, vol. 4, pp. 57-60, May 1996.